Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RegEx supports using RE2 to sjsonnet #244

Merged
merged 10 commits into from
Dec 31, 2024
Merged

Add RegEx supports using RE2 to sjsonnet #244

merged 10 commits into from
Dec 31, 2024

Conversation

stephenamar-db
Copy link
Collaborator

@stephenamar-db stephenamar-db commented Dec 19, 2024

With this PR, I'm adding a handful of methods to expose regular expressions in jsonnet, through std.native()
I'm modeling them after this open PR from jsonnet, which was ported to jrsonnet.

For now, they are in std.native() as they are not part of the default std package and use RE2 instead of the native regexp package (for performance and compatibility reasons with a future go-jsonnet implementation).

  • regexFullMatch(pattern, str) -- Full match regex
  • regexPartialMatch(pattern, str) -- Partial match regex
  • regexReplace(str, pattern, to) -- Replace single occurance using regex
  • regexGlobalReplace(str, pattern, to) -- Replace globally using regex

and the utility function:

  • regexQuoteMeta(str) -- Escape regex metachararacters

Those functions return a object:

std.native("regexFullMatch")("h(?P<mid>.*)o", "hello")

{
   "captures": [
      "ell"
   ],
   "string": "hello"
}

This PR does not add support for the "namedCaptures" return field due to some complications with scalajs and scalanative. Those language both use the JDK Pattern class (js being powered by ECMA regex and Native being powered by RE2(!)), but JDK<20 Pattern class does not have a straightforward way to list the names of groups without some additional hacks. This will be dealt with in a follow up PR.

This PR also adds the ability to cache patterns, and refactors all users of regexes to use it.

@He-Pin
Copy link
Contributor

He-Pin commented Dec 19, 2024

the reg2j's performance is better?

@stephenamar-db
Copy link
Collaborator Author

the reg2j's performance is better?

RE2 is a regular expression engine that runs in time linear in the size of the input. RE2/J is a port of C++ library [RE2](https://github.com/google/re2) to pure Java.

It's also what's used in scala native.
https://scala-native.org/en/latest/lib/javalib.html#regular-expressions-java-util-regex

@He-Pin
Copy link
Contributor

He-Pin commented Dec 19, 2024

Seems like we could have a scala port from it ,t hen the scala-js and scala-native can use it too.

@He-Pin
Copy link
Contributor

He-Pin commented Dec 29, 2024

Is there a test suit for regex?, maybe this will change some behavior or result, eg with default behavior changed.

@stephenamar-db stephenamar-db changed the title STDREGEX Add RegEx supports using RE2 to sjsonnet Dec 29, 2024
@stephenamar-db stephenamar-db marked this pull request as ready for review December 29, 2024 04:20
@stephenamar-db
Copy link
Collaborator Author

Is there a test suit for regex?, maybe this will change some behavior or result, eg with default behavior changed.

I added a test suite. Is there something else you are thinking about?

Copy link
Contributor

@JoshRosen JoshRosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just minor comments on a typo and some suggestions to save on cache lookups in a couple of hot paths.

sjsonnet/src/sjsonnet/YamlRenderer.scala Outdated Show resolved Hide resolved
sjsonnet/src/sjsonnet/PrettyYamlRenderer.scala Outdated Show resolved Hide resolved
sjsonnet/src/sjsonnet/Std.scala Outdated Show resolved Hide resolved
sjsonnet/src/sjsonnet/Std.scala Outdated Show resolved Hide resolved
sjsonnet/src/sjsonnet/StdRegex.scala Outdated Show resolved Hide resolved
@stephenamar-db stephenamar-db merged commit 729e0d8 into master Dec 31, 2024
6 checks passed
@stephenamar-db stephenamar-db deleted the regexp branch December 31, 2024 04:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants